Data Visualization with R

require(ggplot2)
require(dplyr)
require(qdata)
require(MASS)
data(bands)

ggplot2 does not support true 3d surfaces. However it does support many common tools for representing 3d surfaces in 2d: contours, coloured raster (tiles) and bubble plots. These all works similarly, differing only in the aesthetic used for the third dimension.

This chapter presents how to build contours, coloured raster (tiles) and bubble plots.

Countours

A contour plot is a graphical technique for representing a 3-dimensional surface by plotting constant z slices, called contours, on a 2-dimensional format. That is, given a value for z, lines are drawn for connecting the (x,y) coordinates where that z value occurs.

Suppose we want to analize the relationship between humidity, viscosity variables of bands dataset considering also the bivariate density estimate between these variables.

First of all let us compute bivariate density estimate between humidity and viscosity:

# remove NA from bands dataset 
bands_na_rm <- bands %>% na.omit() 

# Compute bivariate density estimate
f2d <- kde2d(bands_na_rm$humidity, bands_na_rm$viscosity, n =100)

# Generate a new dataset including also the newly created variable  
bands_d <- expand.grid(humidity = f2d$x, viscosity = f2d$y) %>%
  tbl_df() %>%
  mutate(density = as.vector(f2d$z))

Countour can be plotted by geom_contour() function:

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_contour()

The third dimension is mapped to z aestethic of ggplot() function. In this case, z is set to density.

You can set bins, by using bins argument, to generate evenly spaced contours in the range of the data:

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_contour(bins = 2)

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_contour(bins = 10)

You can also parameterised the distance between contours setting binwidth argument, which represent the binwidth of countour lines:

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_contour(binwidth = 0.0001)

We can customize the countour plot by changing colour, linetype or the line size:

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_contour(colour = "darkgreen", linetype = 6, size = 1)

It is also possible to map the height of the density curve to the color of the contour lines, by mapping ..level.. to colour scale:

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_contour(aes(colour = ..level..))

..level.. is a variables generated by the statistical transformation used by geom_contour() function.

Mapping ..level.. to colour a legend is automathically produced.

You can also customize your plot by adding in background another geom which draws surfaces: geom_raster() in order to increase the 3d effect.

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
  geom_raster(aes(fill = density)) +
  geom_contour(colour = "white")

Raster (Tiles) Plot

A raster (tiles) plot is a scan pattern in which an area is scanned from side to side in lines from top to bottom. It can be defined also as a pattern of closely spaced rows of dots that form an image.

As you saw in the previous example, raster plot is generated by geom_raster().
geom_raster() is a function for drawing rectangles. The most common use for rectangles is to draw a surface. If you want to draw surfaces you have to map the variable that represent the third dimension, in this case density, to fill aestethic:

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
   geom_raster(aes(fill = density))

You can also add an interpolation to smooth the surface, by setting interpolate = TRUE. It is useful when rendering images.

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
   geom_raster(aes(fill = density), interpolate = TRUE)

A similar result can be achieved by using geom_tile() function. geom_tile() is similar to geom_raster() as both draw rectangles. The main difference is that geom_raster() renders more efficiently than geom_tile(). In theory they should appear the same, but in practice they often do not. If you are writing to a PDF file, the appearance depends on the PDF viewer. On some viewers, when tile is used there may be faint lines between the tiles, and when raster is used the edges of the tiles may appear blurry.

ggplot(bands_d, aes(humidity, viscosity, z = density)) + 
   geom_tile(aes(fill = density))

Bubble Plot

A bubble plot is another type of plot that displays three dimensions of data. Each entity with its triplet (x, y, z) of associated data is plotted as a disk that expresses x and y through the disk’s xy location and z through its size. Bubble charts can be considered a variation of the scatter plot, in which the data points are replaced with bubbles.

It works better with fewer observations so we do a sample of the original dataset:

small <- bands_d %>% sample_n(size = 500)
ggplot(small, aes(humidity, viscosity)) +
  geom_point(aes(size = density)) +
  scale_size_area()

To generate a bubble plot you have to map the third variable, in this case density to size aestethic of geom_point() function. Then you have to add scale_size_area() function to render the area of points proportional to density, this means that scale_size_area() ensures that a value of 0 is mapped to a size of 0.

You can also specify an alpha level, a colour and fill for bubbles as geom_point() settings:

ggplot(small, aes(humidity, viscosity)) +
  geom_point(aes(size = density), alpha = 0.4, colour = "blue", fill = "lightblue") +
  scale_size_area()